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ABSTRACT The genomes of sulfate-reducing bacteria remain poorly characterized, largely due to a paucity of experimental data 
and genetic tools. To meet this challenge, we generated an archived library of 15,477 mapped transposon insertion mutants in 
the sulfate-reducing bacterium Desulfovibrio alaskensis G20. To demonstrate the utility of the individual mutants, we profiled 
gene expression in mutants of six regulatory genes and used these data, together with 1,313 high-confidence transcription start 
sites identified by tiling microarrays and transcriptome sequencing (5 ' RNA-Seq) , to update the regulons of Fur and Rex and to 
confirm the predicted regulons of LysX, PhnF, PerR, and Dde_3000, a histidine kinase. In addition to enabling single mutant 
investigations, the D. alaskensis G20 transposon mutants also contain DNA bar codes, which enables the pooling and analysis of 
mutant fitness for thousands of strains simultaneously. Using two pools of mutants that represent insertions in 2,369 unique 
protein-coding genes, we demonstrate that the hypothetical gene Dde_3007 is required for methionine biosynthesis. Using com- 
parative genomics, we propose that Dde_3007 performs a missing step in methionine biosynthesis by transferring a sulfur group 
to O-phosphohomoserine to form homocysteine. Additionally, we show that the entire choline utilization cluster is important 
for fitness in choline sulfate medium, which confirms that a functional microcompartment is required for choline oxidation. 
Finally, we demonstrate that Dde_3291, a MerR-like transcription factor, is a choline-dependent activator of the choline utiliza- 
tion cluster. Taken together, our data set and genetic resources provide a foundation for systems-level investigation of a poorly 
studied group of bacteria of environmental and industrial importance. 

IMPORTANCE Sulfate-reducing bacteria contribute to global nutrient cycles and are a nuisance for the petroleum industry. De- 
spite their environmental and industrial significance, the genomes of sulfate-reducing bacteria remain poorly characterized. 
Here, we describe a genetic approach to fill gaps in our knowledge of sulfate-reducing bacteria. We generated a large collection of 
archived, transposon mutants in Desulfovibrio alaskensis G20 and used the phenotypes of these mutant strains to infer the func- 
tion of genes involved in gene regulation, methionine biosynthesis, and choline utilization. Our findings and mutant resources 
will enable systematic investigations into gene function, energy generation, stress response, and metabolism for this important 
group of bacteria. 
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Sulfate-reducing bacteria (SRB) are a diverse group of bacteria 
that can use sulfate as a terminal electron acceptor for growth. 
This method of energy conservation is considered to be an ancient 
form of respiration: it is estimated that SRB-mediated sulfate re- 
duction has existed for ~3 billion years and was an important 
process during the early stages of life on earth (1). SRB are found 
in many diverse environments and contribute to the global sulfur 
and carbon cycles, including the mineralization of organic carbon 
in sea sediments (2). SRB are also common inhabitants of the 
human microbiome (3, 4), where they may play a role in inflam- 
matory bowel disease (5). 

SRB are important in a number of industries and applications. 



In the oil industry, SRB contribute to the souring of oil via the 
production of sulfides and corrosion of pipelines and wells (6). In 
wastewater treatment plants, SRB are used to remove sulfates and 
convert hydrogen sulfide by-products into precipitated heavy 
metals (7) Similarly, SRB play an important role in bioremedia- 
tion by reducing and immobilizing heavy metals (8). Lastly, SRB 
hold potential for use in biological fuel cells to generate energy (9) . 

The SRB Desulfovibrio alaskensis G20, formerly known as De- 
sulfovibrio desulfuricans G20, is derived from the G100A strain 
isolated from an oil well in California (10). Relative to the G100A 
strain, the D. alaskensis G20 strain is a spontaneously nalidixic 
acid-resistant mutant that is also cured of the native plasmid pBGl 
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(11). The sequenced D. alaskensis G20 genome has been annotated 
with proteomic and transcript data to improve gene calls (12). 
D. alaskensis G20 is quite distant from a well-studied SRB of the 
same genus, Desulfovibrio vulgaris Hildenborough; their 16S RNA 
sequences share 90% sequence similarity and D. alaskensis G20 
shares 1,873 of its 3,258 protein-coding genes with D. vulgaris 
Hildenborough. Genetic tools based on homologous recombina- 
tion, including markerless deletions and epitope tagging, are 
available for D. vulgaris Hildenborough (13, 14), but such tools 
have yet to be developed in D. alaskensis G20. The D. alaskensis 
G20 genome is predicted to contain 133 transcription factors and 
sigma factors ( 1 5 ) . To date, only four of these transcription factors 
have been characterized experimentally, ArsR (16), MreC (17), 
SahR (18), and Dde_1614 (19). However, using comparative 
genomics, DNA binding motifs and target genes have been pre- 
dicted for 50 of these regulators (20-23). 

Here, we describe the generation and preliminary analysis of a 
collection of D. alaskensis G20 transposon insertion mutants that 
have been tagged with DNA bar codes for high-throughput anal- 
ysis of mutant fitness using competition assays. The transposon 
insertion location has been mapped for the entire collection, and 
the collection is archived to also allow single mutant investigations 
for the majority of genes. The D. alaskensis G20 transposon col- 
lection includes insertion mutants in 2,513 protein-coding genes 
and has already been used to investigate the suboptimality of gene 
expression in bacteria (24) and syntrophic growth of D. alaskensis 
G20 with methanogens (25). Another D. alaskensis G20 DNA bar- 
coded transposon collection has previously been described (26) 
and has been used to identify genes important for fitness in sedi- 
ment (26, 27), H 2 oxidation (28), and syntrophic growth with a 
methanogen (29). However, the Groh et al. collection is about a 
third of the size of our collection and has limited capacity for 
parallel analysis of mutant fitness because only 66 unique bar 
codes were used (26). In addition, only a fraction of the trans- 
poson insertions of the Groh et al. collection have been mapped, 
so the entire collection typically has to be screened for a phenotype 
before a follow-up study can begin (29). 

In this paper, we highlight the utility of the D. alaskensis G20 
transposon collection for generating insights into SRB gene essen- 
tiality, gene regulation, and metabolism. In addition to genes di- 
rectly involved in sulfate reduction, we identified genes in folate, 
thiamine, and menaquinone synthesis as essential in D. alaskensis 
G20. To experimentally validate and update computationally pre- 
dicted regulons, we measured gene expression in individual trans- 
poson mutants of regulatory genes. To augment the analysis of 
these expression data, we mapped the architecture of the 
D. alaskensis G20 transcriptome and identified 1,313 transcription 
start sites (TSSs) with high-density tiling microarrays and tran- 
scriptome sequencing (5' RNA-Seq). Using the combined TSS 
and gene expression data, we updated the regulons of a 54 , Fur, and 
Rex and confirmed the expected regulons of LysX, PerR, PhnF, 
and the histidine kinase Dde_3000. Taking advantage of DNA bar 
codes introduced into the transposon mutants, we generated two 
pools of D. alaskensis G20 transposon mutants for the parallel 
analysis of mutant fitness. Using the competitive fitness assay and 
single-gene validation, we demonstrated that the conserved hypo- 
thetical gene Dde_3007, which belongs to the uncharacterized 
family DUF39, is required for methionine synthesis and specifi- 
cally for homocysteine synthesis. Lastly, we used the competitive 
fitness assay to verify that most genes of the choline utilization 



cluster are required for the anaerobic oxidation of choline and 
confirm, through expression analysis, that Dde_3291 regulates 
this gene cluster. As described here, our comprehensive collection 
of D. alaskensis G20 mutants is a valuable resource for the systems- 
level investigation of SRB physiology, both as single mutants and 
in a pooled fitness assay. 

RESULTS AND DISCUSSION 

D. alaskensis G20 transposon mutant collection and analysis of 
essential genes. To enable systems-level investigation of a sulfate- 
reducing bacterium, we isolated and mapped the insertion loca- 
tion for 15,477 D. alaskensis G20 Tn5 mutants. Of the mutants, 
14,834 were isolated on lactate-sulfate medium and the other 643 
were isolated on lactate-sulfite medium. These mutants are main- 
tained as an archived collection of individual strains and are avail- 
able to the community for single-gene studies (see Data Set SI in 
the supplemental material). As shown in Fig. 1A, the 15,477 trans- 
poson insertions are distributed roughly evenly across the chro- 
mosome but with some bias toward the origin. The 15,477 
mapped mutants include insertions in 2,513 of the 3,258 (77%) 
protein-coding genes in the D. alaskensis G20 genome. For 2,314 
genes, we mapped an insertion within the central portion (5 to 
80%) of the gene (Fig. IB). 

Protein-coding genes with no mapped insertions may be es- 
sential for viability in the medium that we used to select the mu- 
tants (primarily lactate-sulfate). We categorized 337 D. alaskensis 
G20 genes with no insertions in the middle of the coding sequence 
(CDS) (defined as between 5 and 80% of CDS length) and with 
sequence similarity to a known essential gene in other bacteria as 
"expected essential" (see Materials and Methods). In addition, we 
identified 50 Desulfovibrio-specifLC essential genes that met the 
following criteria: the gene had to (i) have an ortholog in Desulfo- 
vibrio vulgaris Hildenborough, Desulfovibrio vulgaris Miyazaki, 
and Desulfovibrio desulfuricans ATCC 27774; (ii) not share signif- 
icant homology to an essential gene contained in the OGEE data- 
base (30); (iii) be adjacent to and cotranscribed with another es- 
sential gene; and (iv) be at least 300 nucleotides in length. We 
considered genes conserved among these four members of the 
Desulfovibrio genus as functionally important and therefore more 
likely to be essential than less conserved genes without insertions. 
Short genes of less than 300 nucleotides, genes that contain repet- 
itive elements that cannot be uniquely mapped, and genes without 
central insertions and without an ortholog to a known essential 
gene or adjacent to another essential gene in the same operon were 
not considered essential (Fig. IB). We used the operon criterion as 
a filter to help identify essential genes because genes in the same 
operon often have similar functions. A complete list of expected 
and Desulfovibrio-speci&c essential genes is contained in Data 
Set S2 in the supplemental material. 

Some of the Desulfovibrio-specific essential genes were antici- 
pated based on their known, vital role in energy conservation. 
These genes include (i) the quinone-interacting membrane- 
bound oxidoreductase qmoABC (Dde_llll:4) (31), (ii) the 
adenylyl-sulfate reductase component aprB (Dde_1109), (iii) dis- 
similatory sulfite reductase dsrAB (Dde_0526:7), and (iv) trans- 
membrane electron carrier components dsrMKJP. dsrO also 
lacked insertions but shared enough homology with nrfC from 
Salmonella enterica serovar Typhimurium LT2 to be considered 
an expected essential. We mapped transposon insertions in two 
genes known to be essential for sulfate reduction in the fraction of 
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FIG 1 Coverage of the D. alaskensis G20 transposon mutant collection. (A) Distribution of 15,477 mapped transposon insertions along the chromosome. (B) 
Number of protein-coding genes that are essential, are dispensable and have an insertion, or are of unknown essentiality. Edge insertions represent genes with an 
insertion(s) in either the first 5% or last 20% of the gene. Repetitive elements are nonunique regions of D. alaskensis G20 in which it is hard to map transposon 
insertion sites. Putative essential genes were subcategorized as expected essential or Desulfovibrio-specific essential, nt, nucleotides. 



the collection that we isolated in lactate-sulfite medium: sat (sul- 
fate adenylyltransferase; Dde_2265) and aprA (adenylyl-sulfate re- 
ductase; Dde_1110). 

We also classified a number of genes involved in the biosynthe- 
sis of the cofactors NAD (nadAC), folate (folCPKD, Dde_2197), 
and menaquinone as either expected or Desulfovibrio-specific es- 
sentials. nadB did not make either list of essentials but also lacks 
transposon insertions. Desulfovibrio genomes do not contain an 
annotated dihydroneopterin aldolase (encoded by the folB gene) 
of the typical folate synthesis pathway. However, it has been dem- 
onstrated that 6-pyruvoyl tetrahydrobiopterin synthase paralogs, 
including DVU1352 from D. vulgaris Hildenborough, function- 
ally rescue Escherichia colifolB mutants (32). Consistent with these 
observations, we classified Dde_2197, a putative 6-pyruvoyl tetra- 
hydrobiopterin synthase and ortholog of DVU1352, as essential. 
An alternative pathway for menaquinone synthesis that uses futa- 
losine as an intermediate has been described in Streptomyces spe- 
cies (33), and orthologs of these Streptomyces enzymes were clas- 
sified as putative essentials in D. alaskensis G20 (Dde_0796:0799, 
Dde_3188, Dde_3185, Dde_1392, Dde_1323, and Dde_0150). 

Identification of 1,313 D. alaskensis G20 transcriptional 
start sites with a high-resolution transcription map. To aid in 
the interpretation of the transposon mutant fitness data and regu- 
lon inference from gene expression profiling, as described below, 
we collected high-resolution tiling microarray and 5' RNA-Seq 
data from D. alaskensis G20 to identify operons, promoter motifs, 
and transcriptional start sites (TSSs). A representative 7-kb win- 
dow of the D. alaskensis tiling microarray and 5 ' RNA-Seq data are 
illustrated in Fig. 2A. To identify D. alaskensis G20 promoter mo- 
tifs, we examined the upstream regions of 1 , 1 72 preliminary TSSs 
identified from the 5' RNA-Seq and tiling microarray data and 
found two significant motifs, for a 70 (642 instances, P < 10 - 440 ) 
and o- 54 (RpoN; 20 instances, P < 10" 15 ) (Fig. 2B and C). The 
D. alaskensis G20 a 70 motif is very similar to the a 70 motif that we 
previously identified in D. vulgaris Hildenborough (34). Com- 
pared to the E. coli a 70 motif, the D. alaskensis G20 a 70 motif has a 
shortened — 10 box and a stronger —35 box, which confirms our 
previous findings in D. vulgaris Hildenborough (34). 



To identify new RpoN targets in D. alaskensis G20, we scanned 
the sequences upstream of the 1,172 preliminary TSSs with the 
Desulfovibrio RpoN motif from RegPrecise (21). From this analy- 
sis, we identified 1 1 new RpoN-dependent promoters that were 
previously unannotated in RegPrecise: Dde_2287:Dde_2285, 
Dde_0420:Dde_0418, Dde_3398 (at codon 13 within the open 
reading frame [ORF]), Dde_0818:Dde_0819, Dde_0062, 
Dde_1408, Dde_1017, Dde_0645, Dde_1501, and two unanno- 
tated small RNAs starting at positions 3627439 on the plus strand 
and 86651 on the minus strand (Fig. 2A). In contrast to a 70 and 
a 54 , we did not identify a motif that corresponds to the remaining 
D. alaskensis G20 sigma factor, RpoH, nor did we find TSSs at the 
expected locations given the predictions in RegPrecise. We spec- 
ulate that RpoH is not active under the growth conditions that we 
used for transcriptome analysis. 

Using the identified D. alaskensis G20 a 70 and a 54 promoter 
motifs in combination with the tiling microarray and 5' RNA-Seq 
data, we applied a semisupervised machine learning approach to 
identify genuine TSSs (see Materials and Methods). At a false dis- 
covery rate of 3%, we identified 1,313 high-confidence TSSs in 
D. alaskensis G20 (see Data Set S3 in the supplemental material for 
a full list). 

Validating and expanding D. alaskensis G20 regulons by ex- 
pression profiling. A key challenge in microbial systems biology is 
mapping and modeling the gene regulatory networks of environ- 
mental bacteria. Despite the success of comparative genomics for 
predicting gene regulation in Desulfovibrio (20, 21, 23), the major- 
ity of D. alaskensis G20 regulators remain without predictions; 
most computational predictions are not experimentally verified; 
and even if a motif prediction exists, all targets may not be iden- 
tified, as we demonstrated for RpoN. To address these challenges 
and to demonstrate the utility of the archived transposon mutant 
collection for targeted single-gene investigations, we measured 
gene expression in mutants of lysX,fur, rex, Dde_3000, perR, and 
phnF to validate and expand their predicted regulons. 

(i) Lysine utilization regulator (LysX). LysX (Dde_2665) is a 
putative regulator of lysine utilization (23), and our tiling data 
confirm that lysX is cotranscribed in an operon with lysA. In ad- 
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FIG 2 Identifying D. alaskensis G20 promoter motifs with transcriptome map. (A) A 7-kb region of the D. alaskensis G20 genome with tiling microarray data 
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dition to lysXA, LysX is predicted to regulate the lysine transporter 
LysW and the uncharacterized protein Dde_2468. In a defined 
medium with no lysine present, we observed little effect of the lysX 
mutant on gene expression relative to wild-type D. alaskensis G20 
(Fig. 3A). However, in a defined medium with lysine, the lysX 
mutant strain had strongly increased expression of lysXA and lysW 
(Fig. 3B). Therefore, in the presence of excess lysine, it appears 
that LysX represses the last step in lysine biosynthesis (LysA) and 
the uptake of lysine (LysW). As D. alaskensis G20 is not believed to 
catabolize lysine, repressing the uptake of excess lysine may be 
adaptive. The expression of Dde_2468 did not respond to the pres- 
ence of lysine, but the expression of the divergently transcribed 
gene Dde_2469 was altered in the lysX mutant (Fig. 3B). It is pos- 
sible that binding of LysX to the site between Dde_2468:Dde_2469 
affects the expression of Dde_2469 and not Dde_2468. 

(ii) Ferric uptake regulator (Fur). In a mutant for fur 
(Dde_2676), the ferric uptake regulator, most of the RegPrecise- 
predicted targets are strongly upregulated (Fig. 3C). Using the 
expression data and high-confidence TSSs, we identified two new 
members of the Fur regulon, Dde_3146:Dde_3144 and Dde_1239 
(Fig. 3C), which encode hypothetical proteins, are induced in the 
fur mutant, and have Fur sites near the TSSs. Two predicted Fur 
targets, Dde_0753 (fur3) and Dde_0133 (bfr), are downregulated 
in the fur mutant, but their respective TSSs are near the Fur sites, 
so these predictions are still likely to be correct. Alternatively, Fur3 
is a paralog of Fur (46% identity), so its downregulation could 
indicate that fur3 (and possibly bfr) is actually regulated by Fur3 



and that the activity of Fur3 increases in a fur mutant background. 
In our expression data, fur is expressed more highly than fur3 and 
fur but not fur3 shows strong fitness effects (24), so we expect that 
Fur is the major regulator. Finally, the expression of the predicted 
Fur targets Dde_2805:Dde_2807 and Dde_2677:Dde_2676 did not 
change in the fur mutant. There is little expression of Dde_2805 in 
our transcriptomic data, so we cannot evaluate the Fur site relative 
to the TSS. The Fur site upstream of Dde_2677 is proximal to a 
TSS, but there is also read-through from the upstream genes, so we 
cannot draw a clear conclusion in this case either. 

(iii) Redox-responsive repressor (Rex). The redox-responsive 
repressor Rex regulates energy metabolism in a wide range of bac- 
teria (22). In a rex mutant, we found that many predicted targets 
were upregulated as expected, but by less than 2-fold ("targets 1" 
in Fig. 3D). To confirm that these mild effects were specific to the 
rex mutant, we compared the expression data from the rex mutant 
to the expression data from other mutant strains. More precisely, 
we used linear regression to fit log 2 expression levels in the rex 
mutant, using data from all of the other strains that were mea- 
sured with the same array design (including wild-type D. alasken- 
sis G20). Effects that cannot be predicted by this model are more 
likely to be directly due to the disruption of rex as opposed to 
subtle variations in growth conditions. A comparison of the 
model to the rex mutant data confirmed many of the expected 
targets. These include genes that are essential for sulfate reduction, 
namely, qmoABCD (Dde_llll:Dde_1114), sat (Dde_2265), ade- 
nylate kinase (Dde_2028), and pyrophosphatase (Dde_1178). 
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FIG 3 Validating and expanding D. alaskensis G20 regulons. In each panel, the y axis shows the normalized log 2 expression levels in regulator mutants: LysX (A 
and B), Fur (C), Rex (D), Dde_3000 (E and F), PerR (G), or PhnF (H and I). In most panels, the x axis shows the normalized log 2 expression of wild-type 
D. alaskensis G20 (G20). In panels F and I, the x axis represents the expected expression level from a linear model that includes the expression data from the other 
mutant strains and wild-type D. alaskensis G20. The putative targets for each regulator are color coded. For PhnF, we averaged the expression data from two 
different mutant strains. 



Conversely, other energy-production genes in the predicted 
Rex regulon were not induced in the mutant ("targets 2" in 
Fig. 3D), including sulfite reductase (dsrABD), adenylyl-sulfate 
reductase (apsAB), transmembrane complex (dsrMKJOP) (35), 
and type 1 cytochrome c 3 :menaquinone oxidoreductase (qr- 
cABCD) (36). This might suggest that these genes are not actually 
targets of Rex, but their Rex sites are well conserved in other De- 
sulfovibrio species (21). Additionally, studies with purified Rex 
protein from D. vulgaris Hildenborough confirmed that Rex binds 



some of these sites in vitro (J. Wall, personal communication). 
Instead, the lack of a response for these genes in the rex mutant 
seems to indicate a more complex mechanism of regulation. Two 
predicted target operons, dhcA-rnfDGEABF (Dde_0580: 
Dde_0587) and hysBA (Dde_2134:Dde_2135), were downregu- 
lated in the rex mutant, but these are predicted to be under com- 
plex regulation by other regulators as well. We removed the gene 
downstream of hysBA, Dde_2136, from the Rex regulon, as the 
tiling microarray data suggested that it is transcribed separately. 
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Finally, some of the genes induced in the rex mutant that were not 
in the original regulon prediction have strong hits to the Rex motif 
near their TSS. Therefore, we added Dde_0552, Dde_1140, 
Dde_1591:Dde_1590, and Dde_2058 to the Rex regulon. 

(iv) Putative histidine kinase (Dde_3000). Our tiling mi- 
croarray data confirm that the putative histidine kinase Dde_3000 
is cotranscribed in an operon with the DNA binding response 
regulator Dde_3003. The ortholog of Dde_3003 in D. vulgaris 
Hildenborough, DVU2934, has a single specific binding site up- 
stream of IpxC (37). Furthermore, a binding motif for DVU2934 
was identified and confirmed by gel shift assays, and this motif is 
present upstream of IpxC in D. alaskensis G20 (37). Thus, we pre- 
dict that Dde_3000 signals to Dde_3003 to control the expression 
of IpxC (Dde_2986). Consistent with this view, IpxC was strongly 
upregulated in the Dde_3000 mutant strain (Fig. 3E). Another 
possibility is that the insertion of a transposon within Dde_3000 
would decrease the expression of Dde_3003 in the mutant strain, 
but we did not observe any decrease in the expression of 
Dde_3003. After taking expression data from other mutant strains 
into account with a linear regression, IpxC seems to be the only 
gene that is strongly upregulated in the Dde_3000 mutant 
(Fig. 3F). We examined some of the other outliers but did not find 
any hits to the response regulator's motif. Thus, we confirmed that 
Dde_3000 signals to Dde_3003, and it appears that IpxC is the only 
target of Dde_3003, as in D. vulgaris Hildenborough. Dde_3003 is 
a predicted cr 54 -dependent transcriptional activator. Consistent 
with this, IpxC has a a 54 binding site (CGGCACGATTATTGCT) 
just upstream of the TSS, and the predicted binding site of Rajeev 
et al. (37) for Dde_3003 (GTGTAAAAAACACACA) is centered at 
— 101 relative to the TSS. Since IpxC is upregulated in the 
Dde_3000 mutant, this implies that during growth in LS4D, 
Dde_3000 reduces the activity of Dde_3003. 

(v) Peroxide-sensing repressor (PerR). PerR (Dde_3674) is a 
peroxide-sensing repressor involved in the regulation of oxidative 
stress. In a D. vulgaris Hildenborough perR (DVU3095) mutant, 
the predicted targets were derepressed during lactate -sulfate 
growth (38). Similarly, we observed that all four of the predicted 
members of the PerR regulon of D. alaskensis G20 (Dde_1143, 
Dde_1222, Dde_1320, and Dde_3674) were strongly induced in 
the perR mutant grown in lactate-sulfate medium (Fig. 3G). While 
additional genes changed expression in the mutant, we did not 
find hits to the PerR motif upstream of these genes, and so these 
probably result from indirect effects. 

(vi) Phosphonate utilization (PhnF). We measured expres- 
sion in two mutants of phnF (Dde_3327), which encodes a puta- 
tive regulator of phosphonate utilization (20). Similarly, in Myco- 
bacterium smegmatis, a homolog of PhnF represses phosphonate 
utilization genes (39). Expression data from the D. alaskensis G20 
phnF mutants were poorly correlated with data from the wild type 
and hence were hard to interpret (Fig. 3H). After comparison of 
the expression data from the phnF mutants to data from all other 
mutant strains using the regression model, it appears that all of the 
expected PhnF targets (Dde_3328:Dde_3336) are expressed more 
highly in the phnF mutants, as expected (Fig. 31). Thus, our data 
confirm that Dde_3327 encodes a repressor of phosphonate utili- 
zation genes in D. alaskensis G20. 

In each of the above examples, either we validated the pre- 
dicted regulons for repressors using our baseline medium or we 
took advantage of the predicted signal for the regulator to profile 
gene expression under a physiologically relevant condition 



(LysX). To extend this workflow to the de novo discovery of new 
regulons for activators, the relevant signal should be first identi- 
fied prior to expression profiling of the single regulatory mutant 
strain. In instances where the signal(s) is unknown, high- 
throughput mutant fitness profiling, such as described below for 
the choline utilization regulator, can be used to identify these sig- 
nals. Given the scale on which these mutant fitness assays can be 
performed (40), this general workflow holds promise for uncov- 
ering new regulons. 

Competitive fitness assays identify Dde_3007, a novel auxo- 
troph required for methionine biosynthesis. To characterize 
nonessential genes in D. alaskensis G20, we constructed two pools 
of mutants and performed competitive fitness assays to simulta- 
neously measure the fitness of 2,369 genes (40, 41). To calculate 
"gene fitness" scores for each gene, we averaged the fitness values 
for the insertion strains of the same gene, as described previously 
(40). Negative gene fitness scores are indicative of genes whose 
mutations result in reduced fitness relative to the typical strain in 
the pools. To validate this approach in D. alaskensis G20, we com- 
pared the fitness of 2,369 genes in LS4D versus LS4D supple- 
mented with Casamino Acids. As expected, the fitness defects of 
many predicted amino acid biosynthesis genes were rescued with 
the addition of Casamino Acids (Fig. 4A). 

Because the methionine synthesis pathway in Desulfovibrio is 
still unknown (42), we used the competitive, pooled mutant fit- 
ness assay to identify auxotrophs specifically rescued by the addi- 
tion of methionine. In addition to the expected methionine bio- 
synthesis genes horn {Dde_2731) and metH (Dde_2115), 
supplementation of minimal medium with methionine also res- 
cued the fitness defects of the uncharacterized genes Dde_2711 
and Dde_3007 (Fig. 4B). The D. alaskensis G20 MetH is missing 
the N-terminal "activation" domain [for reducing Co(II) to 
Co(I)] that is present in E. coli MetH. To identify this activity in 
D. alaskensis G20, we examined the new methionine auxotrophs 
identified by our fitness assay and found that Dde_271 1 encodes a 
predicted ferredoxin and has homology to this missing activation 
domain of E. coli MetH. Dde_3007 encodes a conserved protein 
annotated as domain of unknown function DUF39. To determine 
if Dde_3007 is required for methionine biosynthesis, we comple- 
mented the methionine auxotrophy of a Dde_3007 mutant strain 
with a plasmid-carried copy of wild-type Dde_3007 (Fig. 4C). In 
the absence of the complementation plasmid, the addition of me- 
thionine or homocysteine also rescued the Dde_3007 mutant 
(Fig. 4D). In contrast, the addition of O-succinylhomoserine, 
L-homoserine, O-acetylhomoserine, or cystathionine did not res- 
cue the methionine auxotrophy of the Dde_3007 mutant (data not 
shown). Taken together, these results suggest that Dde_3007 per- 
forms a step in methionine synthesis between L-homoserine and 
homocysteine. 

We used these mutant fitness results to predict the methionine 
biosynthesis pathway in D. alaskensis G20 (Fig. 4E). Dde_2048 
(ZysC), an aspartate kinase, and Dde_0254 (asd), an aspartate- 
semialdehyde dehydrogenase, show only moderately reduced fit- 
ness in minimal medium (Fig. 4B), possibly due to redundancy in 
the D. alaskensis G20 genome (i.e., proAB [Dde_1633, Dde_2689] 
or argBC [Dde_2015, Dde_3455] ). The uncertainty in the pathway 
remains between L-homoserine and homocysteine, as D. alasken- 
sis G20 lacks the metB and metC genes of the classic methionine 
biosynthesis pathway from E. coli. In D. alaskensis G20, we pro- 
pose that L-homoserine is activated to O-phosphohomoserine by 
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FIG 4 Dde_3007 is required for methionine biosynthesis in D. alaskensis G20. (A) Comparison of gene fitness for 2,379 genes in LS4D minimal medium (x axis) 
versus LS4D minimal medium supplemented with 0.2% (wt/vol) Casamino Acids. Genes putatively involved in methionine (blue) and amino acid biosynthesis 
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plus homocysteine medium. (E) Predicted pathway of methionine biosynthesis in D. alaskensis G20. Unknown enzymes are marked in red. 



an unknown enzyme(s). We have indirect evidence that 
O-phosphohomoserine is a metabolite in the D. alaskensis G20 
methionine biosynthesis pathway: O-phosphohomoserine serves 
as a common metabolite for threonine and methionine synthesis 
in Methanococcus jannaschii (43), and D. alaskensis G20 contains 
ThrC (Dde_0171), the enzyme that converts 
O-phosphohomoserine to threonine. In addition, the new en- 
zyme identified here as putatively part of the methionine biosyn- 
thesis pathway, Dde_3007, has a homolog in M. jannaschii. We 
propose that Dde_3007 performs a step in the methionine biosyn- 
thesis pathway between the activated L-homoserine and homocys- 
teine intermediates (Fig. 4D; see below for full explanation). 
D. alaskensis G20 contains two predicted methionine synthase 
genes, a vitamin B 12 -independent enzyme encoded by metE 
(Dde_2328) and a vitamin B 12 -dependent enzyme encoded by 
metH (Dde_2115). metE does not have a significant phenotype in 
minimal medium and is probably not the predominant methio- 
nine synthase in D. alaskensis G20 under our growth conditions. 
In contrast, metH mutants have reduced fitness in minimal me- 
dium but are only moderately rescued by the addition of methio- 
nine (Fig. 4B). One potential reason for the incomplete rescue of 
the metH mutant with methionine is that there are not enough 
methyl groups in the mutant to obviate the need for the 
S-adenosyl-L-methionine (SAM) cycle. 

Comparative analysis of Dde_3007 (DUF39). Orthologs of 
Dde_3007 are found in other organisms which are known to syn- 



thesize methionine but which do not contain known genes for 
transforming L-homoserine to homocysteine, including DET0921 
in Dehalococcoides ethenogenes 195 (44) and MJ0100 in Methano- 
coccus jannaschii DSM 2661 (43). The ortholog of Dde_3007 in 
M. jannaschii, MJ0100, also contains a CBS domain that has been 
shown to sense SAM (45). This CBS domain is absent from 
Dde_3007, so we speculate that the enzyme is under feedback 
inhibition by SAM in M. jannaschii but not in D. alaskensis G20. 
Orthologs of Dde_3007 are sometimes found in close proximity to 
a putative homoserine kinase (Dester_DRAFT_0700 from Desul- 
furobacterium thermolithotrophum BSA, or ThenaDRAFT_1089 
from Thermodesulfobium narugense Na82), which suggests that 
Dde_3007 is not the missing homoserine kinase but rather has 
another role. Furthermore, orthologs of Dde_3007 are often found 
adjacent to a ferredoxin domain or fused to it (i.e., THA_1098 in 
Thermosipho africanus). Dde_3007 orthologs are also often adja- 
cent to COG2122; unfortunately, our mutant collection does not 
contain an insertion within the representative in D. alaskensis G20 
(Dde_2535), but this family contains an ApbE-like domain that is 
probably a flavin transferase (46). The proximity to these genes 
suggests that Dde_3007 participates in a redox reaction. Indeed, a 
biochemical study of methionine synthesis in M. jannaschii sug- 
gested that methionine synthesis in that organism proceeds from 
O-phosphohomoserine and that protein-bound persulfide might 
be the sulfur source, with the sulfur being transferred via a redox 
reaction (43). However, Dde_3007 and its relatives do not have 
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FIG 5 Identification of genes required for choline utilization in D. alaskensis G20. (A) Scatter plot of gene fitness values in lactate-sulfate medium (x axis) versus 
choline-sulfate medium (y axis). Genes are color coded according to the legend in panel C. (B) Comparison of gene expression for wild-type D. alaskensis G20 
(x axis) and a transposon mutant of Dde_3291 (y axis; strain JK05048) grown in choline-sulfate medium. Genes are color coded according to the legend in panel 
C. (C) Same as panel B for growth in lactate-sulfate medium. 



any conserved cysteines, so it is unlikely to be a persulfide carrier. 
Overall, we propose that Dde_3007 participates in the reductive 
transfer of a sulfur group to O-phosphohomoserine to form ho- 
mocysteine. 

Dde_3007 is part of a larger family, variously known as domain 
of unknown function 39 (DUF39), COG1900, or PF01837 (http:// 
pfam.sanger.ac.uk/family/PF01837). Members of this family are 
sometimes annotated as IMP dehydrogenase, but according to the 
Pfam curators, this annotation is spurious. The genomes of some 
methanogens contain two members of this family, one of which 
may be an ortholog of Dde_3007 and the other of which is often in 
proximity to genes that are involved in the synthesis of coenzyme 
M. For example, in Methanoculleus marisnigri JR1 , MemarJJl Wis 
a member of DUF39 and is adjacent to genes encoding cysteate 
synthase (47) and sulfopyruvate decarboxylase (a fused ComDE) 
(48) . Subsequent steps in coenzyme M synthesis involve the trans- 
fer of a sulfide group from sulfotoacetaldehyde to form coenzyme 
M, but the genes involved are not known. So, we propose that 
other members of DUF39 are involved in the transfer of a sulfide 
group to sulfotoacetaldehyde to form coenzyme M. 

A microcompartment is required for choline utilization in 
D. alaskensis G20. D. alaskensis G20 can grow by coupling the 
oxidation of choline to the reduction of sulfate (10). Recently, 
Craciun and Balskus identified a lyase in D. alaskensis G20 (CutC; 
Dde_3282), which cleaves choline to form trimethylamine and the 
toxic metabolite acetaldehyde (49). The acetaldehyde is probably 
further oxidized to acetate, which is coupled to sulfate reduction. 
In addition, they used comparative genomics to identify a larger, 
16-kb gene cluster (termed the choline utilization or cut cluster) 
containing cutC and a number of other genes predicted to be in- 
volved in choline oxidation, including components of a micro- 
compartment thought to be necessary for acetaldehyde sequestra- 
tion (49). To systematically identify D. alaskensis G20 genes 
required for choline oxidation, we compared fitness data from the 
mutant pools grown with either choline or lactate as the carbon 
source. As illustrated in Fig. 5A, 16 cut cluster genes are required 



for choline utilization in D. alaskensis G20 including aldehyde 
dehydrogenases, alcohol dehydrogenases, and several microcom- 
partment shell proteins. Therefore, our results demonstrate ge- 
netically that a microcompartment and acetaldehyde detoxifica- 
tion are required for choline oxidation in D. alaskensis G20. 

In addition to the cut cluster, we identified additional genes 
important for choline utilization, including Dde_3288:Dde_3291 , 
which are adjacent to and divergently transcribed from the cut 
cluster (Fig. 5A). The putative role of Dde_3291, a putative regu- 
lator, is described below. We also identified an acetaldehyde:ferre- 
doxin oxidoreductase (Dde_2460), with a molybdenum or tung- 
sten cofactor, which lies outside the cut cluster and shows a 
choline-specific fitness defect and may be responsible for detoxi- 
fication outside the microcompartment (Fig. 5A). Alternatively, 
the D. alaskensis G20 microcompartment might disproportionate 
acetaldehyde to acetylphosphate and ethanol, as proposed for the 
ethanolamine utilization microcompartment of Salmonella (50). 
In this case, the soluble acetaldehyde dehydrogenase would be 
involved in reoxidizing the ethanol to acetate, which would be 
coupled to sulfate reduction. 

Dde_3291 regulates choline utilization in D. alaskensis G20. 
Our mutant fitness data suggested that Dde_3291, a MerR family 
transcriptional activator, might be an activator of the choline uti- 
lization genes (Fig. 5A). Our tiling microarray data (collected with 
lactate as the carbon source) confirmed that Dde_3291 is part of an 
operon (Dde_3288:Dde_3291) that is expressed in the absence of 
choline, while the rest of the cut gene cluster (Dde_3284: 
Dde_3264) is weakly expressed during growth with lactate. By 
comparing sequences upstream of Dde_3284, Dde_3288, 
Dde_3291, and their homologs in Desulfovibrio salexigens and 
D. desulfuricans, we identified a palindromic motif, CnTTC- 
CCCnnnnGGGGAAnG, with sites in D. alaskensis G20 upstream 
of Dde_3288 and Dde_3284. The motif upstream of Dde_3284 is 
centered at —23 to the TSS, which is expected for MerR- type ac- 
tivators that bind between the — 10 and —35 boxes (51). 

To test the hypothesis that Dde_3291 regulates the cut cluster, 
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we collected expression data from a Dde_3291 transposon mutant 
and D. alaskensis G20 wild type after transfer to either a defined 
lactate-sulfate medium or choline-sulfate medium. By collecting 
expression data 1 h after transfer, we hoped to observe changes in 
gene expression without biasing the experiment by the reduced 
growth of the Dde_3291 mutant strain in choline-sulfate medium. 
Our results show that the Dde_3291 transposon mutant has 
greatly reduced expression of the cut cluster genes with choline as 
a carbon source (Fig. 5B), but not with lactate (Fig. 5C). There- 
fore, Dde_3291 activates the expression of choline utilization 
genes (cut cluster) in the presence of choline as a carbon source. 

In wild-type D. alaskensis G20 cells with choline, we observed 
diminishing expression along the length of the putative Dde_3288: 
Dde_3264 (cut) operon (correlation of the position in the operon 
versus the log 2 ratio, r = 0.93, P < 10~ 7 ). The expression of the 
downstream genes in the cut cluster was also less sensitive to the 
mutation in Dde_3291, with Dde_3267:Dde_3264 showing little 
upregulation in the mutant background with choline (Fig. 5B). 
We propose that Dde_3291 regulates the initiation of transcripts 
upstream of Dde_3284 and that nonspecific termination, along 
with weak transcription from internal promoters, leads to less of 
an effect on the expression of the far downstream genes. 

The expression data also suggested that Dde_3039, a paralog of 
the choline-trimethylamine lyase cutC, might be regulated by 
Dde_3291 (Fig. 5B). The expression pattern of Dde_3039 does not 
seem to be an artifact of cross-hybridization, as the expression 
effect was just as strong even after removing the data from 28 (out 
of 125) potentially cross-hybridizing probes. Additionally, we 
found a weak hit to the Dde_3291 motif (gaacCCcTtCCCcTTAc 
GGGAgGGTtgc) upstream of Dde_3039. Overall, it seems likely 
that Dde_3291 directly regulates Dde_3039. However, the func- 
tion of Dde_3039 remains unclear, as our fitness data show that it 
is not important for choline utilization (Fig. 5A). 

Conclusion. Here, we present a comprehensive transposon 
mutant library of Desulfovibrio alaskensis G20 as a genetic resource 
for investigating gene function in sulfate-reducing bacteria. The 
transposon mutant collection enables targeted investigation of 
single genes, which we used to confirm the predicted regulons of 
LysX, PhnF, PerR, and Dde_3000 as well as to update the regulons 
of Fur and Rex. Additionally, because the transposon mutants 
were engineered to contain DNA bar codes, pooled mutant fitness 
assays with the D. alaskensis G20 mutants can be used to quickly 
generate lists of candidate genes, which can be followed up using 
the mapped and archived collection. We used this workflow to 
identify Dde_3007, a novel gene required for methionine biosyn- 
thesis, and Dde_3291, a regulator of choline utilization in 
D. alaskensis G20. Given the ease and scalability of the pooled 
mutant fitness assay, it is now feasible to assess the mutant fitness 
for each D. alaskensis G20 gene across hundreds of diverse condi- 
tions to globally infer gene function, as we have previously dem- 
onstrated in Shewanella oneidensis MR-1 (40). In summary, high- 
throughput and targeted investigations with the D. alaskensis G20 
transposon mutant collection can be used to uncover key genes 
and pathways in this environmentally and industrially important 
but poorly studied group of bacteria. 

MATERIALS AND METHODS 

Strains, media, and culturing. Desulfovibrio alaskensis G20 was a gift of 
Judy Wall (University of Missouri). The E. coli conjugation donor strain 
WM3064 was a gift of William Metcalf (University of Illinois). D. alasken- 



sis G20 was typically grown in an anaerobic chamber (Coy Laboratories, 
Grass Lake, Michigan) with an atmosphere of nitrogen, carbon dioxide, 
and hydrogen (90:5:5) at 30°C. For the mutant pool experiments, we grew 
the cultures in Hungate tubes that were filled and capped in the anaerobic 
chamber and incubated outside the chamber in the dark at 30°C. For 
culturing D. alaskensis G20 in lactate-sulfate medium, we used two varia- 
tions of Postgate's medium C (1): LS4D (52) and MOLS4 (31). LS4D 
(pH 7.2) contained 60 mM sodium lactate, 50 mM sodium sulfate, 8 mM 
magnesium chloride, 20 mM ammonium chloride, 2.2 mM potassium 
chloride (added after autoclaving), 0.6 mM calcium chloride, 30 mM 
PIPES [piperazine-N,JV'-bis(2-ethanesulfonic acid)] buffer, trace miner- 
als, and vitamins (53). For LS4D, we used resazurin as a redox indicator 
before autoclaving and titanium citrate as a reductant just prior to inoc- 
ulation. To make the rich medium LS4, we supplemented LS4D with 0.1% 
(wt/vol) yeast extract. Rich lactate-sulfite medium (LS3) is identical to 
LS4, except that we reduced the concentration of sodium lactate to 
15 mM, omitted the sodium sulfate, and added 10 mM sodium sulfite. 
MOLS4 (pH 7.2) contains 60 mM sodium lactate, 30 mM sodium sulfate, 
8 mM magnesium chloride, 20 mM ammonium chloride, 2 mM potas- 
sium chloride, 0.6 mM calcium chloride, 30 mM Tris-HCl buffer 
(pH 7.4), trace minerals, iron(II) chloride (0.06 mM)-EDTA (0.12 mM) 
solution, and vitamins. We added 0.1% (wt/vol) yeast extract to MOLS4 
to make the rich medium MOYLS4. MOCS4 is the same as MOLS4 except 
that we replaced the sodium lactate with 30 mM choline chloride and 
reduced the concentration of sodium sulfate to 15 mM. For MOLS4, 
MOYLS4, and MOCS4, we added hydrogen sulfide to a final concentra- 
tion of 1 mM as a reductant just prior to inoculation. All media were 
autoclaved and moved to the anaerobic chamber before cooling. For 
plates, we used the same medium formations except that we added agar to 
a final concentration of 1.5% (wt/vol). We placed agar plates in the anaer- 
obic chamber for 1 day prior to use. For culturing the diaminopimelic acid 
(DAP) auxotroph WM3064, we supplemented LB with DAP to a final 
concentration of 300 /aM. 

Transposon mutagenesis. We previously published detailed methods 
that describe the DNA bar code (TagModule) collection (54) and the use 
of these TagModules to generate DNA-bar-coded transposon mutants in 
Shewanella oneidensis MR-1 (40) and Zymomonas mobilis ZM4 (41). The 
same methods were used to generate the D. alaskensis G20 transposon 
mutant collection. Each TagModule contains two unique 20-bp DNA 
sequences, termed the UPTAG and DOWNTAG, which are flanked by 
common PCR priming sequences. We cloned these TagModules into the 
mini-Tn5 transposon delivery vector pRL27 (55), as previously described 
(54). We created the D. alaskensis G20 transposon mutant collection by 
conjugating wild-type D. alaskensis G20 with the E. coli donor strain 
WM3064 harboring the TagModule-marked pRL27 transposon delivery 
vectors. With minor modifications, we used a previously described con- 
jugation protocol (26). Briefly, we combined mid-log-phase D. alaskensis 
G20 and WM3064 in a single Eppendorf tube, pelleted the cells by cen- 
trifugation, and resuspended the cell pellet in 20 fjl of LS4 medium. This 
concentrated mixture of cells was conjugated for 16 h at 30°C on a nylon 
filter (0.2-jiim pore size; Supelco) on an LS4 agar plate. Postconjugation, 
we transferred the nylon filter to 3 ml of LS4 medium, inverted the tubes 
several times to remove the cells from the filter, incubated the cells for 6 h 
at 30°C, and plated the cells on LS4 plates supplemented with 400 jug/ml 
G418. We picked single, G418-resistant colonies into the wells of a 96-well 
microplate containing 500 /id of LS4 and 800 p,g/ml G418 per well. After 
growth to stationary phase, we added glycerol to a final concentration of 
1 0% (vol/vol) for long-term storage of the transposon mutants at — 80°C. 
For 643 mutants, we replaced LS4 medium with LS3 medium for all trans- 
poson mutagenesis steps. For each transposon mutant, we mapped the 
transposon insertion location and identified the TagModule using a two- 
step arbitrary PCR and sequencing protocol, as previously described (54). 
See Table SI in the supplemental material for a list of all primers used in 
this study. In total, we picked 21,696 colonies for the D. alaskensis G20 
collection and mapped the transposon insertion location for 15,477 mu- 
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tant strains. See Data Set SI for a complete list of the D. alaskensis G20 
transposon collection. 

Identification and classification of D. alaskensis G20 essential genes. 

We classified a D. alaskensis G20 protein-coding gene as an expected es- 
sential if (i) the gene had an ortholog in Desulfovibrio vulgaris Hildenbor- 
ough, Desulfovibrio vulgaris strain Miyazaki, and Desulfovibrio desulfuri- 
cans ATCC 27774; (ii) no transposon was mapped to the central (5 to 
80%) portion of the gene; and (iii) the gene had a significant BLAST hit 
(>30% identity) in the OGEE database of essential genes (30) or has an 
ortholog (using unique COGs or TIGRfam) of an essential gene in either 
E. coli (56) or Bacillus subtilis (57). We classified protein-coding genes as 
putative Desulfovibrio-specific essentials if the genes met the first two cri- 
teria described above and shared an operon with and were adjacent to 
another Desulfovibrio-specific or expected essential. Additionally, to be 
classified as a Desulfovibrio-specific essential, the gene had to be at least 
300 nucleotides long. We used a gene length cutoff of 300 nucleotides 
because, given the number of mutants mapped to and the length of the 
genome, we would expect a transposon insertion every 241 nucleotides. 

Gene expression with tiling microarrays and 5 ' RNA-Seq. We per- 
formed D. alaskensis G20 tiling microarray (NimbleGen) experiments on 
mid-exponential-phase cultures grown in LS4D and LS4 media using 
techniques described previously (34). Briefly, after removing probes with 
a second-best BLAT hit of 50 or more nucleotides to avoid cross- 
hybridization, we collected data for over 2 million 60-mer probes that 
covered both strands of the genome with a 6-nucleotide step size. We 
computed normalized log levels with a model that takes into account a 
genomic control and nucleotide content, as described previously (34). 
After removing the probes with the lowest 1 % intensities in the genomic 
DNA control, we adjusted the normalized expression values so that their 
median was 0. 

We prepared a 5' RNA-Seq library with mRNA from a mid-log-phase, 
LS4D culture of D. alaskensis G20, using previously described techniques 
(34). Briefly, we treated the mRNA with terminator 5' -phosphate- 
dependent exonuclease (Epicentre) to remove partially degraded tran- 
scripts, converted 5' triphosphates to monophosphates, and ligated an 
RNA sequencing adaptor. After cDNA synthesis, we enriched for products 
that contained adaptors on both ends by PCR and purified the library 
using Ampure DNA XP beads (Beckman). We sequenced 40 nucleotides 
(Illumina GA IIx) and aligned 18 million reads to the D. alaskensis G20 
genome with ELAND (Illumina). 

Promoter motif analysis. To identify D. alaskensis G20 promoter se- 
quence motifs, we analyzed a preliminary set of 1 , 1 72 TSSs that had at least 
50 5' RNA-Seq reads and showed a sharp rise in normalized log 2 intensity 
in the tiling microarray data from LS4D (34, 58). For each preliminary 
TSS, we extracted the sequence from —40 to + 1 on the transcribed strand 
and searched for motifs using MEME 3.5 with a motif width of 30 to 40 
nucleotides and the zero-or-one-occurrence per site (zoops) model (59). 
We used Patser (60) to score every location in the genome for how well it 
matched the significant a 70 motif and the a 54 motif from RegPrecise (21). 

Identification of high-confidence TSSs. We considered any location 
with 50 reads in the 5' RNA-Seq data and with more reads than surround- 
ing locations (up to 25 nucleotides away) as a potential transcription start 
site (TSS). To classify these 14,844 candidates as genuine TSSs, we con- 
sidered the number of reads, whether the tiling data showed a sharp rise at 
that location (34, 58), and the strength of any promoter motif upstream of 
the TSS. We combined these sources of information with a semisuper- 
vised machine learning approach: to generate training data for each data 
source, we used the other two data sources to label potential TSS locations 
as likely or unlikely to be genuine TSSs (34). We used these training data 
to infer a statistical model for each source of information. Each statistical 
model converts the raw score(s), such as how well the TSS matches a 
promoter motif or the number of 5' RNA-Seq reads, to an estimate of the 
log odds, log [P(Score|TSS)/P(Score|not TSS)], based on how often that 
score occurs in the likely-TSS or unlikely-TSS training sets. For each tiling 
experiment, we used two different features — the difference in log intensity 



between the regions on either side of the putative TSS and the local cor- 
relation to a step function (58) — so that we had four tiling features. We 
built a statistical model for each tiling feature separately and then com- 
bined the log odds for these features by finding the best-fitting linear 
combination (i.e., logistic regression). Then, we added the log odds from 
5' RNA-Seq, tiling, and promoter motifs (i.e., a naive Bayesian classifier). 
Finally, we chose an arbitrary cutoff (log odds >4) to identify high- 
confidence TSSs. Above this cutoff, we obtained 1,313 TSSs in the genuine 
data. When we shuffled the data, by computing tiling features and motif 
features for randomly selected locations, we obtained just 40 locations 
above our threshold (log odds >4). This suggests that the high-confidence 
TSSs include about 40 false positives, or a false discovery rate of 3% (40/ 
1,313). 

Gene expression profiling of regulatory mutants. We measured gene 
expression in wild-type D. alaskensis G20 and 18 different regulatory mu- 
tants. See Table S2 in the supplemental material for a list of these mutant 
strains and the growth conditions used for expression profiling. For each 
mutant, we verified the correct strain by PCR with a transposon and 
genome-specific primer pair. The regulatory mutants and wild-type 
D. alaskensis G20 were typically grown to mid-log phase and centrifuged 
at 4°C for 10 min at 10,000 X g, and the harvested cells were stored at 
— 80°C. For strain JK05048 (transposon mutant in Dde_3291), we trans- 
ferred late-log-phase cells growing in MOLS4 to either fresh MOLS4 or 
MOCS4 medium for 1 h before harvesting cells. For strain JK05162 (trans- 
poson mutant in Dde_2665; lysX), we transferred late-log-phase cells 
growing in MOLS4 to either fresh MOLS4, MOLS4 with 0.3 mM lysine, or 
MOYLS4 medium and incubated them for 1 h before harvesting cells. As 
controls for the Dde_3291 and Dde_2665 experiments, we did the same 
1-h incubation experiments with wild-type D. alaskensis G20. RNA isola- 
tion, cDNA synthesis, labeling, hybridization to NimbleGen microarrays, 
and data analysis were performed as previously described (61). For each 
experiment, we set the median of the normalized log 2 expression levels to 
zero. 

Mutant pool fitness assays. We designed two pools of D. alaskensis 
G20 transposon mutants, pool 1 with 4,069 strains and pool 2 with 4,056 
strains, such that within each pool, each strain contains a unique Tag- 
Module (54). We constructed and assayed two pools in order to maximize 
the number of unique transposon insertions, as we have more insertion 
mutants than TagModules. Individual transposon mutants were rear- 
rayed from the glycerol stock microplates to new microplates with fresh 
LS4 medium supplemented with G418 (800 fig/ml) using a liquid han- 
dling robot (Beckman Biomek 3000) housed in the anaerobic chamber. 
The fresh cultures were grown for 2 days at 30°C, and all of the individual 
strains were combined using the robot. For each pool, we added glycerol 
to a final concentration of 10% (vol/vol) and stored multiple 1-ml ali- 
quots at — 80°C. During construction of the pools, any D. alaskensis G20 
transposon mutant strains with E. coli contamination were excluded. Ad- 
ditionally, some mutants did not grow at all or grew poorly from the 
original glycerol stocks. For example, some of the mutants selected on 
lactate-sulfite medium did not grow in the lactate-sulfate medium used to 
construct the pools. Lastly, some transposon mutants likely have a 
wrongly assigned TagModule. For these reasons, we do not have fitness 
data for all of the strains in the original pool designs. 

We performed pooled mutant fitness assays as previously described 
(40, 41). The two pools of mutants were grown separately to mid-log 
phase in LS4 at 30°C, and samples of each pool culture were collected as a 
"start" control. The remaining culture was pelleted, washed twice with 
phosphate buffer, and finally resuspended in the same volume of LS4D or 
phosphate buffer (for the choline experiment). We inoculated the pools in 
the selective medium at a starting optical density at 600 nm (OD 600 ) of 
0.02 in 10 ml of medium. After growth of the mutant pool reached satu- 
ration (4 to 6 population doublings), we collected "condition" samples. 
Genomic DNA extraction, DNA bar code amplification, and hybridiza- 
tion of the DNA tags to the GenFlex 1 6K_v2 microarray ( Affymetrix) were 
performed as described previously (40, 62). For some experiments, we 
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hybridized the UPTAGs from pool 1 and the DOWNTAGs from pool 2 to 
a single microarray because the two tags in the TagModule provide redun- 
dant data (54). In this study, we performed pooled fitness assays under the 
following five conditions: LS4D, LS4D with 0.2% (wt/vol) Casamino Ac- 
ids, LS4D with 1 /j,M methionine, MOLS4 without vitamins, and MOCS4 
without vitamins. We excluded the vitamins in the latter two experiments 
because our vitamin solution contained trace amounts of choline chlo- 
ride. 

Data processing, normalization, and calculation of strain and gene 
fitness were performed as described previously (40). Briefly, we calculated 
the fitness of a strain in the pool as the log 2 ratio of its bar code signal 
intensity under the condition relative to the start. We averaged the fitness 
values from relevant strains to calculate gene fitness. If a gene had data 
from a central insertion (within the central 5 to 80% of the gene), then 
data from other, edge insertions were not included in the average. In this 
paper, we report only the averaged gene fitness values. We normalized the 
fitness values so that the typical gene had a fitness of zero under each 
condition. 

Genetic complementation of Dde 3007. We complemented the me- 
thionine auxotrophy of a transposon mutant in Dde_3007 (strain 
JK00771) by introducing a wild-type copy of Dde_3007 on plasmid 
pMO9075 (63). Our tiling array data suggested that the annotated start 
codon of Dde_3007 was incorrect, and comparative genomics suggested 
that the true start codon was at position 2993462. We cloned a copy of 
Dde_3007 with the revised start codon into pMO9075 using Gibson as- 
sembly and verified the clone, pJK2, by sequencing. Plasmids pMO9075 
and pJK2 were introduced into wild-type D. alaskensis G20 and JK00771 
by electroporation (16) and selected on MOYLS4 plates supplemented 
with 800 u.g/ml spectinomycin. 

Microarray data accession numbers. All fitness data are available in 
MicrobesOnline (http://microbesonline.org/). TheD. alaskensis G20 gene 
expression data from this study are available in the Gene Expression Om- 
nibus (GEO) under the accession numbers in parentheses: tiling microar- 
ray data (GSE39471), 5' RNA-Seq data (GSE49484), and the regulatory 
mutant data (GSE49530). 
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